OLAP Execution Model Using Relational Operations

ABSTRACT

In one embodiment the present invention includes an OLAP execution model using relational operations. In one embodiment, the present invention includes, a method comprising receiving a first query in an online analytic processor (OLAP) executing on one or more computers, the OLAP generating and comprising a model specifying a graph defining a plurality of nodes and a plurality of tiers, each node corresponding to a different operation on data. A second query is generated by the OLAP. The second query includes a plurality of layered subqueries each corresponding to one of the nodes in the graph for specifying the different operations on data. The second query is received in a relational engine coupled to the datastore. The relational engine executes the second query, and in accordance therewith, retrieves data.

CROSS REFERENCE TO RELATED APPLICATIONS

The present disclosure is a divisional of and claims priority to U.S.patent application Ser. No. 12/951,948 for “OLAP Execution Model UsingRelational Operations” filed Nov. 22, 2010, which is hereby incorporatedherein by reference in its entirety for all purposes.

BACKGROUND

The present invention relates to computing and data processing, and inparticular, to on-line analytic processing execution model usingrelational operations.

Unless otherwise indicated herein, the approaches described in thissection are not prior art to the claims in this application and are notadmitted to be prior art by inclusion in this section.

FIG. 1 shows a conventional data storage and analysis platform. Computersystem 101 may include a data warehousing component 102 to store largevolumes of historical data about an enterprise, for example. Datawarehouse 102 may include loading (staging) components, formatconversion components, and/or data consistency components, for example.Data warehouse 102 may load data into a datastore 105 through arelational engine 103. Datastore 105 may include one or more databaseand/or database management components. In a data warehousingapplication, as well as other data storage applications, data is storedas a star schema 106. A star schema stores data in one or more facttables and multiple dimension tables.

Users may ask questions that are answerable using the stored data bysubmitting queries. Typically, queries are submitted to an on-lineanalytics processing component (OLAP) 107 executing on computer system101. OLAP 107 may receive a query, such as a business question, andcommunicate with relational engine 103 to obtain the data necessary foranswering the query. Some OLAP engines retrieve granular data fromdatastore 103 using a single query to relational engine 103 and cache orotherwise store large volumes of data locally to generate answers toOLAP queries. One disadvantage to this approach is that large volumes ofdata must be retrieved and updated regularly to maintain the accuracy ofthe OLAP query results. Other OLAP engines query cells in the datastoreseparately and at a granular level based on the input OLAP query. Onedisadvantage of this approach is the complexity required to generategranular queries and combine the results in the OLAP engine. Anotherdisadvantage with this approach is that for complex user questions thenumber of cells can get very high and therefore the overall responsetime becomes very bad.

Thus, there is a need for improved technique for responding to OLAPqueries. The present invention solves these and other problems byproviding an OLAP execution model using relational operations.

SUMMARY

Embodiments of the present invention improve queries in systems usingOLAP. In one embodiment, the present invention includes a methodcomprising receiving a first query in an online analytic processor(OLAP) executing on one or more computers, generating a model, the modelspecifying a graph defining a plurality of nodes and a plurality oftiers, each node corresponding to a different operation on data,generating a second query based on the model, the second query includinga plurality of layered subqueries each corresponding to one of the nodesin the graph for specifying the different operations on the data, andreceiving the second query in a relational engine coupled to adatastore, wherein the relational engine executes the second query, andin accordance therewith, retrieves data.

In one embodiment, a first node on a highest tier in the modelcorresponds to a subquery comprising an exception aggregation, theexception aggregation associating a first aggregation type with a firstmeasure and a first dimension of a dimension table, the firstaggregation type being different than a default aggregation type used onthe first measure when aggregating said first measure across otherdimensions, wherein the second query comprises at least one SQLstatement corresponding to a second node on a tier below the highesttier nested in a SQL statement corresponding to the first query.

In one embodiment, the OLAP generates and comprises said model, andwherein the second query is generated by said OLAP.

In one embodiment, the plurality of nodes correspond to successiveoperations on the data in the database performed by the relationalengine.

In one embodiment, the plurality of nodes of said graph comprise a firstnode on a first tier corresponding to first data, wherein the first datais all the data required to answer the first query, the first nodehaving a corresponding first subquery for retrieving the first data, anda plurality second nodes on a plurality of tiers above the first tier,each second node corresponding to a different operation on the firstdata.

In one embodiment, the plurality of nodes includes a root node, aplurality of second tier nodes, and one or more higher tier nodes,wherein the root node has a corresponding first subquery to operate onraw data in the database to produce first data required to answer thefirst query, wherein the plurality of second tier nodes are coupled tothe root node in the graph and each have corresponding subqueries thatoperate on the first data, and wherein at least one of the second tiernodes is coupled to the one or more higher tier nodes, wherein eachhigher tier node has a corresponding subquery for operating on datagenerated by the subquery of the at least one second tier node.

In one embodiment, the datastore is a database, and wherein the data isstored in the database in at least one fact table and a plurality ofdimension tables.

In one embodiment, the second query is a nested SQL statement.

In one embodiment, each subquery comprises a SQL statement. For example,each subquery may be a database view containing a SQL statement.

In one embodiment, a portion of the nodes of the graph are representedby subqueries executed in the relational engine and one or more of thenodes in the graph correspond to data operations executed by said OLAP.

In one embodiment, the method further comprises returning said retrieveddata from the relational engine to the OLAP, processing the retrieveddata in the OLAP based on said one or more nodes in the graphcorresponding to data operations executed by said OLAP, and returning aresult to said first query.

In one embodiment, the graph specifies an execution plan whereoperations corresponding to lower tier nodes are performed on databefore operations corresponding to higher tier nodes and wherein dataresulting from an operation corresponding to a first node on a lowertier is operated on by a second higher tier node directly connected tothe first node in the graph.

In one embodiment, the present invention includes a system comprisingone or more computers, a database, an online analytic processor (OLAP),executing on the one or more computers, to receive a first query, amodel, the model specifying a graph defining a plurality of nodes on aplurality of tiers, each node corresponding to a different operation ondata, and a relational engine coupled to a datastore to receive thesecond query, where the relational engine executes a second query, andin accordance therewith, retrieves data. The OLAP generates the secondquery based on the model, the second query including a plurality oflayered subqueries each corresponding to one of the nodes in the graphfor specifying the different operations on data.

In another embodiment, the present invention includes non-transitorycomputer readable storage medium embodying a computer program forperforming a method, said method comprising receiving a first query inan online analytic processor (OLAP) executing on one or more computers,generating a model, the model specifying a graph defining a plurality ofnodes and a plurality of tiers, each node corresponding to a differentoperation on data, generating a second query based on the model, thesecond query including a plurality of layered subqueries eachcorresponding to one of the nodes in the graph for specifying thedifferent operations on the data, and receiving the second query in arelational engine coupled to a datastore, wherein the relational engineexecutes the second query, and in accordance therewith, retrieves data.

The following detailed description and accompanying drawings provide abetter understanding of the nature and advantages of the presentinvention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a conventional data storage and analysis platform.

FIG. 2 illustrates a system using an OLAP execution model according toone embodiment of the present invention.

FIG. 3A is an example graph to illustrate one embodiment of the presentinvention.

FIG. 3B shows successive data processing represented by the examplegraph in FIG. 3A to illustrate an embodiment of the present invention.

FIG. 4 illustrates a method according to one embodiment of the presentinvention.

FIG. 5 is another example graph to illustrate an embodiment of thepresent invention.

FIG. 6 illustrates hardware of a special purpose computing machineconfigured with a process according to one embodiment of the presentinvention.

DETAILED DESCRIPTION

Described herein are techniques for an OLAP execution model usingrelational operations. The apparatuses, methods, and techniquesdescribed below may be implemented as a computer program (software)executing on one or more computers. The computer program may further bestored on a computer readable medium. The computer readable medium mayinclude instructions for performing the processes described below. Inthe following description, for purposes of explanation, numerousexamples and specific details are set forth in order to provide athorough understanding of the present invention. It will be evident,however, to one skilled in the art that the present invention as definedby the claims may include some or all of the features in these examplesalone or in combination with other features described below, and mayfurther include modifications and equivalents of the features andconcepts described herein.

FIG. 2 illustrates system using an OLAP execution model according to oneembodiment of the present invention. Computer system 201 includes anon-line analytic processor (OLAP) 202, relational engine 203, and datastore 204. Computer system 201 may include one or more computers thateach may have one or more microprocessor and memory, for example. Theterm computer system includes a single computer, multiple computers, oneor more client computers, one or more server computers, or portablecomputing devices, for example. OLAP 202, relational engine 203, anddata store 204 may be implemented in software and executed on computersystem 201. OLAP 202, relational engine 203, and data store 204 may beexecuted on the same computer or on different computing devices (e.g.,in a distributed system). Datastore 204 may include one or moredatabases that may include database management systems and may or maynot include dedicated hardware. Datastore 204 may include an in-memorydatabase and/or a database implemented on one or more hard disk drivesor other storage devices, for example.

OLAP 202 receives a query. The query may specify quantitative questionsthat are answerable using data in data store 204. For example, the querymay specify data fields that are capable of being derived from the rawdata in data store 204. Upon receipt of the query, OLAP 202 interactswith relational engine 203 to retrieve data from data store 204 togenerate results. In this example, data in data store 204 is stored as amultidimensional model, such as a star schema. Star schema 205 storesdata in at least one fact table and a plurality of dimension tables.Generally, data in data store 204 is stored in relational tables thatare accessed by relational engine 203. Relational engine may beoptimized for retrieving data from relational tables, for example.

Features and advantages of the present invention include a modelspecifying a graph 210. The model may be represented in a computer usinga variety of programming techniques, such as metadata or an objectmodel, for example. Graph model 210 (herein, “graph” or “model”)specifies successive operations of data from relational database 203 foranswering the OLAP query. Graph 210 defines a plurality of nodes and aplurality of tiers. Each node in graph 210 corresponds to a differentoperations on data stored in a database, for example. As described inmore detail below, each node corresponds to a subquery. Since the nodesare layered in tiers, the subqueries are correspondingly layered so thatthe nodes in the model define successive operations on the raw data inthe data store 204. Graph 210 may specify an execution plan, forexample, where operations corresponding to lower tier nodes areperformed on data before operations corresponding to higher tier nodes,and where data resulting from an operation corresponding to a first nodeon a lower tier is operated on by a second higher tier node directlyconnected to the first node in the graph. Accordingly, graph 210provides an execution model for translating the OLAP query intosubqueries that can be optimized by a relational engine 203 to improvethe efficiency of retrieving the data needed to answer the OLAP query.While graph 210 is illustrated here as residing in OLAP 202, it is to beunderstood that the graph may reside in relational engine 203 or betweenOLAP 202 and relational engine 203, for example. Techniques that may beuseful in connection with generating graph model 210, for example, aredisclosed in U.S. patent application Ser. No. 12/914,445 to ChristophWeyerhaeuser et al., entitled Database Calculation Engine, the entiredisclosure of which is hereby incorporated herein by reference.

Once the query is received by OLAP 202, the graph model 210 isgenerated. In this example, a second query is generated by OLAP 202based on graph model 210 and sent to relational engine 203. The secondquery may include a plurality of layered subqueries. The second querymay comprise SQL statements or, for example, a nested SQL statementcorresponding to the graph as described herein. Each layered subquerycorresponds to one of the nodes in the graph for specifying thedifferent subsets of data stored in the database. The second query isreceived in and executed by relational engine 203. Relational engine 203retrieves data specified by the second query and returns the data toOLAP 202. OLAP 202 returns a result in response to the OLAP query usingthe data returned by the second query.

FIG. 3A illustrates an example graph according to one embodiment of thepresent invention. Graph 300 includes nodes 301-310 (L0-L9) coupledtogether in layered tiers. Node L0 is the lowest tier or “root node.”The lowest tier may have only a single root node L0, for example. NodeL0 represents all the needed data from a relational database to answerany OLAP query that uses the graph model. L0 may correspond to requireddata in its most granular form in a relational table in a database, forexample, such as a single receipt from a transaction including atransaction ID, store name, customer, and price. L0 corresponds to arelational database query to retrieve all the data needed to answer oneor more OLAP queries, for example. Graph 300 includes a second tier ofnodes 302-303 (L1, L7-L8) directly coupled to node L0. Each node L1,L7-L8 may represent a further refinement (e.g., a subset or aggregation)of the data represented by L0. Data may be selected, aggregated, orotherwise operated on (i.e., transformed) across different nodes in thegraph. For example, L1 may correspond to a transformation of data fromL0. Accordingly, L1 may have a corresponding query that selects and/orprocesses data from the data generated by the L0 query. L7 maycorrespond to a different transformation of data from L0. Accordingly,L7 may have another corresponding query that selects and/or processesdata from the data generated by the L0 query. L8 and L9 may similarlyhave different corresponding queries that select and/or process the datafrom the L0 query.

In this example, graph 300 further includes a third tier of nodes at306-307 (L2 and L6). L2 and L6 may further refine the data selected bythe L1 query. For example, L2 may correspond to an operation on datafrom L1. Accordingly, L2 may have a corresponding query that processesdata from the data generated by the L1 query. Similarly, L6 maycorrespond to a different operation on data from L1. Accordingly, L6 mayhave a corresponding query that processes data from the data generatedby the L1 query. Further refinements on data represented by L2 may bemade at forth tier node 308 (L3) and two fifth tier nodes 309-310(L4-L5).

FIG. 3B illustrates successive data processing represented by theexample graph in FIG. 3A. In this illustrative diagram, queries 320-324(Q0-Q4) correspond to nodes L0-L4. Graph 300 corresponds to multiplelayered (or successive) queries on datastore 350. For example, nodesL0-L4 correspond to queries Q0-Q4, respectively. L0 corresponds to Q0producing data D1, which is a subset of all data store 350. L1corresponds to Q1 producing data D2, which results from operating ondata D1. Similarly, L2 corresponds to Q2 producing data D3, whichresults from operating on data D2. Likewise, L3 corresponds to Q3producing data D4, which results from operating on data D3. Finally, L4corresponds to Q4 producing data D5, which results from operating ondata D4. Each of the nodes may have corresponding queries that retrievedata for answering a portion of the OLAP query. Accordingly, the desiredOLAP query results are, in a sense, modeled in graph 300 so that thedata required to answer the OLAP query may be retrieved moreefficiently. It is to be understood that a particular graph maycorrespond to a single OLAP query or multiple OLAP queries.

FIG. 4 illustrates a method according to one embodiment of the presentinvention. At 401, a first query is received in an OLAP softwarecomponent executing on a computer system. In this example, the OLAPgenerates a model specifying a graph defining a plurality of nodes and aplurality of tiers. Each node corresponds to a different operation ondata. At 402, the graph is generated by the OLAP. At 403, a second queryis generated by the OLAP based on the model. The second query mayinclude a plurality of layered subqueries each corresponding to one ofthe nodes in the graph for specifying the different operations on data.In some applications, all the nodes in the graph may have correspondingsubqueries that are processed by a relational engine. However, in otherembodiments only a portion of the nodes of the graph are represented bysubqueries executed in the relational engine, and one or more of thenodes in the graph correspond to data operations executed by the OLAP.At 404, the second query is executed in a relational engine coupled to adatabase. Data in the database is accessed according to the second queryand subqueries and retrieved. Data is returned from the relationalengine to the OLAP. Alternatively, database views (e.g., database viewscontaining SQL statements) may be stored inside the relational engineand then queries may be executed against the database views to retrievethe data. At 407, the OLAP may further process the returned data. Forexample, the OLAP may format the data or the OLAP may further operate onthe data based on nodes in the graph that are performed in the OLAP, forexample. At 408, the OLAP returns a result to the first query.

In one embodiment, the model separates the application problem (e.g.,the OLAP problem) from the technical expression of the execution. Ofcourse, a variety of optimizations in the graph may be implemented basedon the application using relational database optimization techniques. Inthe above example, layers may be combined using grouping sets. Anotherexample is that filters in upper layers can be pushed down to lowerlayers. Not necessarily all layers within the graph need to be pusheddown to a relational processing engine. In some implementations, somelayers inside the graph are processed by the OLAP engine itself. Alldata that is not within the reach of the relational engine (e.g. datafrom remote data accesses etc. . . . ) may be temporarily placed into aworkspace of the relational engine using temporary tables. Accordingly,some implementations may include data from other sources into theprocessing graph.

EXAMPLES

Features of the present invention include an execution model that handsover not only the retrieval of the data from a multi-dimensional model,but also OLAP processing of data retrieved from a database to arelational engine. The model allows translation of the OLAP problem intoa relational problem, and therefore works as an interface between theOLAP engine and a relational engine.

The result set from a multi-dimensional model can be considered as a setof rows or table. Nearly all operations necessary to execute OLAPfunctionality can be expressed in a layer model of nodes on multipletiers, where each layer (a node at a particular tier) can be looked atlike a relational view of data. Layers can depend on each other, andother relational data also can contribute to a higher layer. The layerthe multi-dimensional data is retrieved in to map to the most granulardata contained in a query definition is named layer 0 (L0).

Each layer, like a SQL view, may comprise a set of columns that can bederived from deeper layers by Projection, Formulas (including fixedvalues), Aggregation and Grouping, Joining deeper layers and otherrelational data, Filtering on deeper layers, or Sorting data forexample. The SQL Having clause may be expressed as two layers, where thehaving is a filter on columns of the higher layer that are derived byprojection from aggregated columns of the lower layer. The order ofexecution of formulas and aggregation can be defined as in the SQLstandard.

A subquery identifier that indicates that a row in the result set oflayer 0 is part of a subquery result can be treated like a normal columninside the model. It can be used for filtering and projection, forexample, in particular embodiments. A subquery identifier is illustratedin U.S. Patent Publication No. US 2007/0073658 A1 published on Mar. 29,2007, entitled Efficient Handling of Multipart Queries AgainstRelational Data, the entire disclosure of which is incorporated hereinby reference for all purposes.

Table 1 shows an example OLAP query result.

TABLE 1 Country Material #Customers Quantity DE 0001 5 10 0003 3 5 Total6 15 US 0001 3 7 0002 5 11 Total 7 18 Total 11 33

Shown are the dimensions Country, Material and the measure Quantity withaggregation SUM, and a calculated measure #Customers. #Customers iscalculated as the distinct number of customers per combination of Countyand Material. #Customers does not stick to aggregation SUM, as the samecustomers that has bought Material 0001 in Country DE might also havebought Material 0003 in country DE. It is not necessary to distinguishbetween dimensions and measures in the model because there aresituations where, for example, formulas also apply to dimensions, andaggregation operations, such as COUNT DISTINCT, are possible on columnsthat are projections of former dimensions.

The above query may be translated into the relation execution model asfollows and as illustrated in FIG. 5.

Layer 0 (e.g., the root node) at 501 is the layer that retrieves allneeded data from the multidimensional model. Layer 0 can be expressed asa SQL VIEW:

CREATE VIEW L0 AS SELECT COUNTRY, MATERIAL, CUSTOMER, SUM(QUANTITY)“QUANTITY” FROM . . . GROUP BY COUNTRY, MATERIAL, CUSTOMER

Layer 1 at 502 retrieves the quantities per country and material fromlayer 0:

CREATE VIEW L1 AS SELECT COUNTRY, MATERIAL, SUM(QUANTITY) “QUANTITY”FROM L0 GROUP BY COUNTRY, MATERIAL

Layer 2 at 506 retrieves the quantities per country from layer 1:

CREATE VIEW L2 AS SELECT COUNTRY, SUM(QUANTITY) “QUANTITY” FROM L1 GROUPBY COUNTRY

Layer 3 at 507 retrieves the total quantities from layer 2:

CREATE VIEW L3 AS SELECT SUM(QUANTITY) “QUANTITY” FROM L2

Layer 4 at 503 retrieves the #Customers by Country and Material:

CREATE VIEW L4 AS SELECT COUNTRY, MATERIAL, COUNT(DISTINCT CUSTOMER)“NO_CUSTOMERS” FROM L0 GROUP BY COUNTRY, MATERIAL

Layer 5 at 504 retrieves the #Customers by Country:

CREATE VIEW L5 AS SELECT COUNTRY, COUNT(DISTINCT CUSTOMER)“NO_CUSTOMERS” FROM L0 GROUP BY COUNTRY

Layer 6 at 505 retrieves the total of #customers:

CREATE VIEW L6 AS SELECT COUNTRY(DISTINCT CUSTOMER) “NO_CUSTOMERS” FROML0

After definition of the layers, the needed data can be retrieved fromthe layers using SQL statements, such as a nested SQL statementcorresponding to the above VIEWs. In one embodiment, additional SQLfilters may be used as illustrated in the following SQL statement:

SELECT COUNTRY, SUM(QUANTITY) FROM L2 WHERE COUNTRY=‘US’ GROUP BYCOUNTRY.

As another example, conversions, such as currency conversion, can beexpressed via a join to a relational table CC containing the followingcurrency conversion information:

base currency, the target currency, and optionally a keydate and theconversion rate.A currency conversion that converts the data using daily conversionrates of the bookdate and sums up all amounts per customer is shown asfollows:CREATE VIEW L_(X) AS SELECT L_(X-1).CUSTOMER, CC.TARGET_CURRENCY ASCURRENCY, SUM(CC.CONVERSION_RATE*L_(X-1).AMOUNT) FROM L_(X-1) JOIN CC ONL_(X-1).CURRENCY=CC.BASE_CURRENCY AND L_(X-1).BOOKDATE=CC.KEYDATE GROUPBY L_(X-1).CUSTOMER, CC.TARGET_CURRENCY

Enhancing the Execution Model with Exception Aggregation

Standard Aggregation and Exception Aggregation

A multidimensional data model normally specifies a default aggregationfor a measure. For example, measures expressing stock (i.e., inventory)values are normally aggregated using SUM, except over the timedimension.

Exception aggregation may be implemented as a metadata definition, forexample that is attached to a measure. In contrast to the standardaggregation, exception aggregation defines a special aggregation that isapplied by the analytic engine (i.e., the OLAP) when aggregating themeasure over the specified exception aggregation reference dimension.The mentioned stock values are usually aggregated over time usingAVERAGE or FIRST or LAST aggregation, for example.

Table 1 is reproduced again here to illustrate an alternativeimplementation using exception aggregation. Note that the #customersdoes not aggregate across both country and material nor across bothcountries. Accordingly, exception aggregation may be used as describedbelow.

TABLE 1 Country Material #Customers Quantity DE 0001 5 10 0003 3 5 Total6 15 US 0001 3 7 0002 5 11 Total 7 18 Total 11 33

A normal aggregation rule to achieve the expected results for Table mayinclude:

-   -   1. Aggregate the data using standard aggregation up to the        granularity that includes all requested dimensions and the        exception aggregation reference dimension;    -   2. Aggregate the data further up to the granularity that        includes all requested dimensions using the exception        aggregation.        This can be achieved by marking the exception aggregation        reference dimension in our execution model with a special “keep        in groupby” attribute, which prevents the engine executing the        model from removing this groupby column if it is not requested        in the final query.

Using this technique, Table 1 can be expressed by a two layer graphmodel, where the highest layer (here L1) corresponds to a querycomprising exception aggregation as shown in the example below:

CREATE VIEW L0 AS SELECT COUNTRY, MATERIAL, CUSTOMER, SUM(QUANTITY)“QUANTITY” FROM . . . GROUP BY COUNTRY, MATERIAL, CUSTOMER CREATE VIEWL1 AS SELECT COUNTRY, MATERIAL, CUSTOMER (“KEEP IN GROUP_BY”), 1“NO_CUSTOMERS”, FROM L0 GROUP BY COUNTRY, MATERIAL, CUSTOMER

In this example, the above queries correspond to a two layer model wherea first node L1 on a highest tier in the model corresponds to the L1view, which includes an exception aggregation (“KEEP IN GROUP_BY”) forthe dimension CUSTOMER. The exception aggregation allows CUSTOMER to beaggregated in different ways as illustrated below to obtain the resultsshown in Table 1. Node L0 is on a tier below the highest tier, which inthis simple case corresponds to the L1 view.

All numbers in Table 1 can be retrieved by only selecting the data fromthese two layers. For instance, OLAP queries may be received and used toretrieve the desired information from the model as follows. The quantitycolumn can be fetched from View L0 by the following 3 SQL statements:

A1: SELECT COUNTRY, MATERIAL, SUM(QUANTITY) FROM L0 GROUP BY COUNTRY,MATERIAL A2: SELECT COUNTRY, SUM(QUANTITY) FROM L0 GROUP BY COUNTRY A3:SELECT SUM(QUANTITY) FROM L0

The #customers can be fetched from View L1 by the following 3 SQLstatements:

B1: SELECT COUNTRY, MATERIAL, SUM(NO_CUSTOMERS) FROM L1 GROUP BYCOUNTRY, MATERIAL B2: SELECT COUNTRY, SUM(NO_CUSTOMERS) FROM L1 GROUP BYCOUNTRY B3: SELECT SUM(NO_CUSTOMERS) FROM L1

The SQL statements against View L1 are processed by the engineinterpreting the “keep in groupby” attribute. All other unnecessarygroupbys are generically removed by the engine. The resulting executedSQL is the following:

B1: SELECT COUNTRY, MATERIAL, SUM(NO_CUSTOMERS) FROM ( SELECT COUNTRY,MATERIAL, CUSTOMER, 1 “NO_CUSTOMERS” FROM L0 GROUP BY COUNTRY, MATERIAL,CUSTOMER) GROUP BY COUNTRY, MATERIAL

For example, to process B1 in the inner SQL, no groupbys are removed,because COUNTRY and MATERIAL are requested in the outermost selectstatement, and customer is needed as an exception aggregation referencedimension. The above example illustrates that the second query comprisesat least one SQL statement corresponding to the L0 node of the model onthe tier below the highest tier (e.g., the L1 tier) that is nested in aSQL statement corresponding to the first query (e.g., B1: SELECTCOUNTRY, MATERIAL, SUM(NO_CUSTOMERS) FROM L1 GROUP BY COUNTRY,MATERIAL). The following nested queries are similar for B2 and B3:

B2: SELECT COUNTRY, SUM(NO_CUSTOMERS) FROM ( SELECT COUNTRY, CUSTOMER, 1“NO_CUSTOMERS” FROM L0 GROUP BY COUNTRY, CUSTOMER) GROUP BY COUNTRY

To process B2 in the inner SQL, the groupby MATERIAL is removed, as itis not requested in the outermost select statement, and not needed asexception aggregation reference dimension.

B3: SELECT SUM(NO_CUSTOMERS) FROM ( SELECT CUSTOMER, 1 “NO_CUSTOMERS”FROM L0 GROUP BY CUSTOMER)

To process B3 in the inner SQL the groupby MATERIAL and COUNTRY areremoved, as they are not requested in the outermost select statement,and not needed as exception aggregation reference dimension.

As another example, population may be calculated with standardaggregation SUM and exception aggregation MAX over dimension year. Inthis example L0 is as follows:

CREATE VIEW L0 AS SELECT COUNTRY, STATE, YEAR (KEEP IN GROUPBY),SUM(POPULATION) “POPULATION” FROM . . . GROUP BY COUNTRY, STATE, YEAR

In order to get the population of all countries the following query maybe run against the above single layer model:

SELECT COUNTRY, MAX(POPULATION) FROM L0

The above query is processed as follows:

SELECT COUNTRY, MAX(POPULATION) FROM (

-   -   SELECT COUNTRY, YEAR, SUM(POPULATION) “POPULATION” FROM . . .        GROUP BY COUNTRY, YEAR)        group by country        Outside the analytic engine, population appears as having max        aggregation, but it only has max aggregation over time, over all        other dimensions sum aggregation is executed.

Example Hardware

FIG. 6 illustrates hardware of a special purpose computing machineconfigured with a process according to one embodiment of the presentinvention. The following hardware description is merely one example. Itis to be understood that a variety of computers topologies may be usedto implement the above described techniques. An example computer system610 is illustrated in FIG. 6, which shows components of a singlecomputer. Computer system 610 includes a bus 605 or other communicationmechanism for communicating information, and one or more processor(s)601 coupled with bus 605 for processing information. Computer system 610also includes a memory 602 coupled to bus 605 for storing informationand instructions to be executed by processor 601, including informationand instructions for performing the techniques described above, forexample. This memory may also be used for storing variables or otherintermediate information during execution of instructions to be executedby processor 601. Possible implementations of this memory may be, butare not limited to, random access memory (RAM), read only memory (ROM),or both. A storage device 603 is also provided for storing informationand instructions. Common forms of storage devices include, for example,a hard drive, a magnetic disk, an optical disk, a CD-ROM, a DVD, a flashmemory, a USB memory card, or any other medium from which a computer canread. Storage device 603 may include source code, binary code, orsoftware files for performing the techniques above, for example. Storagedevice and memory are both examples of non-transitory computer readablestorage mediums.

Computer system 610 may be coupled via bus 605 to a display 612, such asa cathode ray tube (CRT) or liquid crystal display (LCD), for displayinginformation to a computer user. An input device 611 such as a keyboardand/or mouse is coupled to bus 605 for communicating information andcommand selections from the user to processor 601. The combination ofthese components allows the user to communicate with the system. In somesystems, bus 605 may be divided into multiple specialized buses.

Computer system 610 also includes a network interface 604 coupled withbus 605. Network interface 604 may provide two-way data communicationbetween computer system 610 and the local network 620. The networkinterface 604 may be a digital subscriber line (DSL) or a modem toprovide data communication connection over a telephone line, forexample. Another example of the network interface is a local areanetwork (LAN) interface to provide a data communication connection to acompatible LAN. Wireless links are another example. In any suchimplementation, network interface 604 sends and receives electrical,electromagnetic, or optical signals that carry digital data streamsrepresenting various types of information.

Computer system 610 can send and receive information through the networkinterface 604 across a local network 620, an Intranet, or the Internet630. For a local network, computer system 610 may communicate with aplurality of other computers, such as server 615. In the Internetexample, software components or services may reside on multipledifferent computer systems 610 or servers 631-635 across the network.The processes described above may be implemented on one or more servers,for example. A server 631 may transmit actions or messages from onecomponent, through Internet 630, local network 620, and networkinterface 604 to a component on computer system 610. The softwarecomponents and processes described above may be implemented on anycomputer system and send and/or receive information across a network,for example.

The above description illustrates various embodiments of the presentinvention along with examples of how aspects of the present inventionmay be implemented. The above examples and embodiments should not bedeemed to be the only embodiments, and are presented to illustrate theflexibility and advantages of the present invention as defined by thefollowing claims. Based on the above disclosure and the followingclaims, other arrangements, embodiments, implementations and equivalentswill be evident to those skilled in the art and may be employed withoutdeparting from the spirit and scope of the invention as defined by theclaims.

What is claimed is:
 1. A method comprising: receiving a first query inan online analytic processor (OLAP) executing on one or more computers;generating a model, the model specifying a graph defining a plurality ofnodes and a plurality of tiers, each node corresponding to a differentoperation on data; generating a second query based on the model, thesecond query including a plurality of layered subqueries eachcorresponding to one of the nodes in the graph for specifying thedifferent operations on the data: and receiving the second query in arelational engine coupled to a datastore, wherein the relational engineexecutes the second query, and in accordance therewith, retrieves data.2. The method of claim 1 wherein a first node on a highest tier in themodel corresponds to a subquery comprising an exception aggregation, theexception aggregation associating a first aggregation type with a firstmeasure and a first dimension of a dimension table, the firstaggregation type being different than a default aggregation type used onthe first measure when aggregating said first measure across otherdimensions, wherein the second query comprises at least one SQLstatement corresponding to a second node on a tier below the highesttier nested in a SQL statement corresponding to the first query.
 3. Themethod of claim 1 wherein the OLAP generates and comprises said model,and wherein the second query is generated by said OLAP.
 4. The method ofclaim 1 wherein the plurality of nodes correspond to successiveoperations on the data in the database performed by the relationalengine.
 5. The method of claim 1 wherein the plurality of nodes of saidgraph comprise: a first node on a first tier corresponding to firstdata, wherein the first data is all the data required to answer thefirst query, the first node having a corresponding first subquery forretrieving the first data; and a plurality second nodes on a pluralityof tiers above the first tier, each second node corresponding to adifferent operation on the first data.
 6. The method of claim 1 whereinthe plurality of nodes includes a root node, a plurality of second tiernodes, and one or more higher tier nodes, wherein the root node has acorresponding first subquery to operate on raw data in the database toproduce first data required to answer the first query, wherein theplurality of second tier nodes are coupled to the root node in the graphand each have corresponding subqueries that operate on the first data,and wherein at least one of the second tier nodes is coupled to the oneor more higher tier nodes, wherein each higher tier node has acorresponding subquery for operating on data generated by the subqueryof the at least one second tier node.
 7. The method of claim 1 whereinthe datastore is a database, and wherein the data is stored in thedatabase in at least one fact table and a plurality of dimension tables.8. The method of claim 1 wherein the second query is a nested SQLstatement.
 9. The method of claim 1 wherein each subquery is a databaseview containing a SQL statement.
 10. The method of claim 1 wherein aportion of the nodes of the graph are represented by subqueries executedin the relational engine and one or more of the nodes in the graphcorrespond to data operations executed by said OLAP, the method furthercomprising: returning said retrieved data from the relational engine tothe OLAP; processing the retrieved data in the OLAP based on said one ormore nodes in the graph corresponding to data operations executed bysaid OLAP; and returning a result to said first query.
 11. The method ofclaim 1 wherein the graph specifies an execution plan where operationscorresponding to lower tier nodes are performed on data beforeoperations corresponding to higher tier nodes and wherein data resultingfrom an operation corresponding to a first node on a lower tier isoperated on by a second higher tier node directly connected to the firstnode in the graph.
 12. A system comprising: one or more computers; adatabase; an online analytic processor (OLAP), executing on the one ormore computers, to receive a first query; a model, the model specifyinga graph defining a plurality of nodes on a plurality of tiers, each nodecorresponding to a different operation on data, wherein the OLAPgenerates a second query based on the model, the second query includinga plurality of layered subqueries each corresponding to one of the nodesin the graph for specifying the different operations on data; and arelational engine coupled to a datastore to receive the second query,wherein the relational engine executes the second query, and inaccordance therewith, retrieves data.
 13. The system of claim 12 whereinthe plurality of nodes correspond to successive operations on the datain the database performed by the relational engine.
 14. The system ofclaim 12 wherein the plurality of nodes of said graph comprise: a firstnode on a first tier corresponding to first data, wherein the first datais all the data required to answer the first query, the first nodehaving a corresponding first subquery for retrieving the first data; anda plurality second nodes on a plurality of tiers above the first tier,each second node corresponding to a different operation on the firstdata.
 15. The system of claim 12 wherein the plurality of nodes includesa root node, a plurality of second tier nodes, and one or more highertier nodes, wherein the root node has a corresponding first subquery tooperate on raw data in the database to produce first data required toanswer the first query, wherein the plurality of second tier nodes arecoupled to the root node in the graph and each have correspondingsubqueries that operate on the first data, and wherein at least one ofthe second tier nodes is coupled to the one or more higher tier nodes,wherein each higher tier node has a corresponding subquery for operatingon data generated by the subquery of the at least one second tier node.16. The system of claim 12 wherein the datastore is a database, andwherein the data is stored in the database in at least one fact tableand a plurality of dimension tables.
 17. The system of claim 12 whereinthe second query is a nested SQL statement.
 18. The system of claim 12wherein each subquery is a database view containing a SQL statement. 19.The system of claim 12 wherein a portion of the nodes of the graph arerepresented by subqueries executed in the relational engine and one ormore of the nodes in the graph correspond to data operations executed bysaid OLAP, the method further comprising: returning said retrieved datafrom the relational engine to the OLAP; and processing the retrieveddata in the OLAP based on said one or more nodes in the graphcorresponding to data operations executed by said OLAP; and returning aresult to said first query.
 20. A non-transitory computer readablestorage medium embodying a computer program for performing a method,said method comprising: receiving a first query in an online analyticprocessor (OLAP) executing on one or more computers; generating a model,the model specifying a graph defining a plurality of nodes and aplurality of tiers, each node corresponding to a different operation ondata; generating a second query based on the model, the second queryincluding a plurality of layered subqueries each corresponding to one ofthe nodes in the graph for specifying the different operations on thedata; and receiving the second query in a relational engine coupled to adatastore, wherein the relational engine executes the second query, andin accordance therewith, retrieves data.